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Abstract — Given the rapid increase in traffic, greater demands 
have been put on research in high-speed switching systems. Such 
systems have to simultaneously meet several constraints, e.g., 
high throughput, low delay and low complexity. This makes it 
challenging to design an efficient scheduling algorithm, and has 
consequently drawn considerable research interest. However, pre- 
vious results either cannot provide a 100% throughput guarantee 
without a speedup, or require a complex centralized scheduler. In 
this paper, we design a distributed 100% throughput algorithm for 
crosspoint buffered switches, called DISQUO, with very limited 
message passing. We prove that DISQUO can achieve 100% 
throughput for any admissible Bernoulli traffic, with a low time 
complexity of 0(1) per port and a few bits message exchanging 
in every time slot. To the best of our knowledge, it is the first 
distributed algorithm that can provide a 100% throughput for a 
crosspoint buffered switch. 

I. Introduction 

With the growing Internet traffic demand, there is an in- 
creasing interest in designing large-scale high-performance 
packet switches. There is also a growing need for high- 
speed switching in the backplane of multiprocessing high- 
performance computer architectures |[T|, |[2|, and in large data 
centers ||[3J, |4j. 

A scheduling algorithm is needed to schedule packet trans- 
missions in such a system. A good algorithm has to meet 
several requirements, e.g., high throughput, low delay, and 
low complexity. In order to achieve these requirements, such 
switches usually require centralized, sometimes complex, al- 
gorithms. The well-known maximum weight matching (MWM) 
algorithm fsl, |[6| can achieve 100% throughput for any 
admissible arrival traffic, but it is not practical to implement 
due to its high computational complexity (0{N^)). Also, the 
MWM algorithm needs a centralized scheduler. This increases 
the implementation complexity and leads to communication 
overhead. A number of practical iterative algorithms have been 
proposed, such as iSLIP |7| and DRRM |8|. However, they 
cannot guarantee 100% throughput for general arrival traffic 
patterns. 

Due to the memory speed limit, most current switches use 
input queuing (IQ) jSj, (Tj, |Q, | [TQ| or combined input and 
output queuing (CIOQ) fTT\. To address the high complexity 
of designing scheduling algorithms for input-queued switching 
architectures, the crosspoint buffered switching architecture 
has been proposed, which promises a simpler scheduling 
algorithms and better delay performance |[T2|-p4). For a 
crosspoint buffered switch, each input maintains virtual output 



queues (VOQs), one for each output, and each crosspoint 
contains a finite buffer. With a speedup of 2, the authors 
in p5| , |T6| showed that a crosspoint buffered switch can 
provide 100% throughput under any admissible traffic. How- 
ever, without speedup, previous 100% throughput results are 
only limited to uniform traffic loads. Under uniform traffic, it 
has been shown that longest queue first at the input port and 
round-robin at the output port (LQF-RR) |13|, or a simple 
round-robin scheduler at both input and output ports (RR- 
RR) |12|, guaranteed 100% throughput. In |17|, the authors 
proposed a distributed scheduling algorithm and derived a 
relationship between throughput and the size of crosspoint 
buffers. However, to achieve 100% throughput, an infinite-size 
crosspoint buffer was needed. To our knowledge, there is no 
distributed algorithm that can achieve 100% throughput for a 
finite crosspoint buffer without speedup. 

There has always been a close relationship between the field 
of switch scheduling and scheduling transmissions in wireless 
ad hoc networks jSj, p8|. Recently, it has been shown that 
CSMA-like algorithms |19|-p4| can achieve the maximum 
throughput in wireless ad hoc networks. Stations only need 
to sense the channel and make their scheduling decisions 
based on local queue information. These algorithms are easy 
to implement since no message passing is required. They are 
the first distributed algorithms that can achieve the maximum 
theoretical capacity in wireless networks. 

Inspired by the CSMA-like algorithms and using many of 
the technique pioneered by them, we propose a distributed 
algorithm for crosspoint buffered switches that can stabilize 
the system under any admissible Bernoulli traffic. Note that 
for such CSMA-like algorithms to work properly, a node has 
to know its neighbors state in the previous slot by carrier 
sensing. This can be achieved in a wireless network due to 
the broadcast property of the medium. However, this cannot 
be easily implemented in a switching system. We make a key 
observation that the crosspoint buffers can be used for implicit 
message passing. By observing the buffer states, an input can 
get some partial information on whether the corresponding 
outputs are busy or not. This requires no change in the 
switch fabric architecture or implementation. Based on this 
observation, we designed DISQUO; the Distributed QUeue 
input-Output scheduler. With DISQUO, an input only uses 
its local queue information and the locally observable partial 
schedule in the previous time slot to make its scheduling 
decision. We prove the stability of the system and evaluate 
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the performance of DISQUO by running extensive simulations. 
For technical reasons, each input does need to have the global 
maximum queue size in the system, which requires some 
message exchanging in each time slot. However, the simu- 
lations we conducted show that without the explicit message 
passing, the algorithm can still stabilize the system for the 
traffic patterns that we tested. Therefore, we propose the fully 
distributed algorithm without the global maximum queue size 
information as a conjecture. This result fulfills the long sought 
conjectured promise of this architecture p2| , |T3| , p5|-p7|. 
The simulation results also show that it can provide good delay 
performance, comparable to output-queued switches, under 
different types of traffic. 

The rest of the paper is organized as follows. Some the- 
oretical preliminaries are presented in the next section. We 
present DISQUO in Section [III] and prove the system stability. 



Simulation results are presented in Section |IV| We give the 
proof of system convergence in Section |Dj and conclude the 
paper in Section |V| 

II. Preliminaries 

In this section, we introduce the notation and preliminary 
results that we will use in the theoretical proof of our algo- 
rithms. 



Definition 1. Consider a graph G{V, E), with W = [VF^J^^y 
a vector of weights associated with the vertices. Glauber 
dynamics [25] is a Markov chain over T{G). Suppose that 
the chain is at state X(n — 1) = [Xi{n — l)]i^Y. The next 
transition of Glauber dynamics follows the rules: 

• Select a vertex i ^ V uniformly at random. 
. //Vj G J\f{i), Xj{n - 1) = 0, then 

X^{n) = I ^ ^ith probability j^^g^ 
1 otherwise. 

• Otherwise, Xi{n) = 0. 

The Glauber dynamics is irreducible, aperiodic and time- 
reversible over T{G) |25|. It has a product-form stationary 
distribution, which is: 



^(X) = iexp(^iyO;XGX(G), 



(1) 



where Z is a normalizing constant in order to have the sum 
of the probabilities unit total mass. 



A. Glauber Dynamics 

A sequence of random variables (Xq, Xi, • • • ) is a Markov 
chain with state space and transition matrix P if for all x, 
y e ft, Sill n > 1, and all events Hn-i= U'^ZoiXs = Xs}, 
we have 

P{X,+i=7/|{X, =x}UHn-l} 

= P{Xn+i =y\Xn = x} = P(x, y) 

The process can then be described as: 

Ax(r) = Ax(r - 1)P = Ax(0)P^ 

where fi{r) is the probability distribution of Xr. 

The Markov chain is irreducible if any state can reach any 
other state. If the system starts from any state X and it can 
return to the state within finite time with a probability 1, the 
Markov chain is positive recurrent. If the Markov chain is 
irreducible and positive recurrent, it has a unique stationary 
distribution tt so that: 

lim iJ,{r) = TT. 

r^oo 

Let P* denote the transition probability matrix for the 
reverse Markov chain (• • • , X^, X^-i, - - ■). If P = P*, the 
Markov chain is called time-reversible p5|. 

A graph G = {V^E) consists of a vertex set V and an 
edge set E, where the elements of E are unordered pair of 
vertices: E C {{x^y} : x^y e V^x ^ y}. If {x^y} e E, y 
is a neighbor of x (and also x is a neighbor of y). Let JV{x) 
denote all the neighbors of x. A independent set I C V is 3. 
set such that if x G /, G N{x), y ^ I. Let T{G) represent 
all the independent sets of G. 



B. Mixing Time 



Glauber dynamics can converge to its stationary distribution 
starting from any initial distribution. To characterize the con- 
vergence speed, we need to quantify the time that it takes for 
Glauber dynamics to reach close to its stationary distribution. 
To establish the result, we need to define some notation first. 

Definition 2. (Distance of probability distributions) Given two 
probability distributions and v over a finite space Q,, the 
total variance (TV) distance is defined as: 



(2) 



and the distance [26], represented as || ^ — l||2,/x. is defined 
as: 
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For any two vectors, fi, u gM[^', we define: 



(3) 



(4) 



Following the definition, the probability distances satisfy the 
following condition. 

Lemma 1. [26] Given two probability distributions /i and v 
over a finite space fi, the equation below holds: 



2,/x 



>2\\u — ^\\tv 



(5) 
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Fig. 1. An example of a crosspoint buffered switch. Each input has virtual 
output queues (VOQs). There is a buffer with a size of K at each crosspoint 
of the fabric. 



Proof: 
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(using Cauchy-Schwarz inequality) 

= X] l^(^) - m(^)I = 2||2^ - mIItv (6) 



Definition 3. [25] (Mixing time) For a Markov chain with 
a transition probability matrix P and a stationary distribution 
TV, define the distance: 



d{r) :=max||Ax(0)P^-7r||Ty. 

/x(0) 

The mixing time is defined as: 

Tmixi^) '= min{r : d{r) < 6}. 



(7) 



(8) 



From the definition, we can see that mixing time is a 
parameter to measure the convergence rate of a Markov chain 
to its stationary distribution. Also, following Lemma [T] the 
mixing time can be measured by calculating the distance 
of /j,{r) and tt . 



III. DISQUO: A Distributed Algorithm for a 
Crosspoint Buffered Switch 

In this section, we will present DISQUO for a crosspoint 
buffered switch. The algorithm is totally distributed. Inputs 
and outputs utilize the states of crosspoint buffers to implicitly 
exchange information. We will prove the system stability 
for any admissible Bernoulli traffic, and evaluate the delay 
performance by running extensive simulations for different 
traffic patterns. 



A. Crosspoint Buffered Switch 

With today's ASIC technology, it is now possible to add a 
small buffer at each crosspoint inside the crossbar (see Fig. 
[T]). This makes the crosspoint buffered or combined input 
and crossbar queueing (CICQ) switch a much more attractive 
architecture since its scheduler is potentially much simpler. 
The input and output schedulers can be independent. First, 
each input picks a crosspoint buffer to send a packet to. Then, 
each output picks a crosspoint buffer to transmit a packet 
from. However, existing algorithms p2| , |T3| , | [27| , | [28| either 
cannot guarantee 100% throughput or require a centralized 
scheduler. 

An NxN crosspoint buffered switch is shown in Fig.[T] We 
assume fixed size packet (cell) switching. Variable size packets 
can be segmented into cells before switching and reassembled 
at the output ports. There are virtual output queues (VOQs) 
at the inputs to prevent head-of-line blocking. Each input 
maintains N VOQs, one for each output. Let VOQij represent 
the VOQ at input i for output j, and Qij (n) the queue length of 
VOQij at time n. Let (z, j) represent the crosspoint between 
input i and output j. 

Each crosspoint has a buffer of size K. Most current 
implementations are constrained by the buffer size. However, 
it turns out that K = I \^ sufficient for DISQUO. We will 
therefore assume that K = 1 m the following. Our algorithm 
can be easily extended to the case when K > 1. Let CBij 
denote the buffer at the crosspoint between input i and output 
j. Bij{n) G {0, 1} denotes the occupancy of CBij at time n. 

A schedule can be represented by S(n) = [S^(n), S^(n)]. 
S^(n) = [S-j{n)] is the input schedule. Each input port can 
only transmit at most one cell at each time slot. Thus the input 
schedule is subject to the following constraints: 

Y^S'.^in) < 1, S',^{n) = if B,j{n) = L (9) 
j 

S^(n) = [Sfj{n)] is the output schedule. It has to satisfy 
the following constraints: 



if Bij{n) =0. 



(10) 



Let Xij represent the arrival rate of traffic between input 
i and output j. We assume that the arrival process is i.i.d. 
Bernoulli. 

Definition 4. An arrival process is said to be admissible if it 
satisfies: 

Xij < 1, and ^Aij < 1. (11) 



Let 0-* denote the traffic that the equivalence in Eq. ([TT]) 
holds. It is easy to verify that for any admissible traffic a, there 
exists an e > such that cr < (1 — e)t7* component- wise. 

Let IIQII represent the norm of matrix Q: ||Q|| = 

1/2 

( Xli j Qlj) • The stability of a system is defined as: 
Definition 5. A system of queues is said to be stable if: 



lim sup^||Q(n)|| < oo. 



(12) 
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Theorem 1. A scheduling algorithm, which can stabilize the 
system for any admissible traffic in a bufferless crossbar 
switch, can also stabilize the system for any admissible traffic 
in a crosspoint buffered switch 

Proof: Please refer to Property 1 of Ref . [W] . ■ 

Following Theorem [T] all the scheduling algorithms that 
have been proposed for an input-queued switch, e.g., the 
maximum weight matching (MWM) |5|, can be applied to a 
crosspoint buffered switch. As we will show later, the reason 
that DISQUO can stabilize the system for any admissible traf- 
fic is that, after the system converges, the schedule generated 
at every time slot has a weight that approaches the one with 
the maximum weight matching schedule. 

B. DISQUO Scheduling Algorithms 

Before presenting the algorithm, we need to introduce some 
further notation. A DISQUO schedule X(n) is a schedule that 
is generated by the DISQUO algorithm. It is used to determine 
the input schedules and output schedules. A DISQUO schedule 
has the following properties: 

Property 1. A DISQUO schedule X(n) can be represented by 
an NxN matrix, where Xijin) G {0, 1}, and Xij{n) < 1, 
E,^.i(n)<l. 

With some abuse of notation, we also use X to represent 
a set, and write (i, j) G X if Xij = 1. Note that a DISQUO 
schedule X has the property that if X^j = 1, then Vz' ^ i, 
Xi'j = and V/ 7^ j, X^^/ = 0. We define these crosspoints 
as its neighbors as follows. 

Definition 6. The neighbors of a crosspoint (i^j) are defined 
as: 

^f{h3) = or I W ^ i, V/ ^ j} (13) 

A DISQUO schedule X then has the following property: 

Property 2. If {ij) e X, \/{k,l) e {k,l) ^ X 

Let X represent the set of all DISQUO schedules. 

Property 3. At each time slot, when a DISQUO schedule 
is generated, each input and output port determine their 
schedules by observing the following rules: 

• For input i, when Xij{n) = 1, ifQij{n) > and Bij{n — 
1) = 0, S^j{n) = 1. Otherwise, S^j{n) = 0. 

• For output j, if Xij (n) = 1 and Bij{n) > 0, Sfj{n) = 1. 

Property 4. For an input i, if Vj, Xij = 0, then it is referred 
to as a free input. A free input port has the freedom to pick 
any eligible crosspoint to serve, i.e. it can transfer a packet 
to any empty crosspoint buffer 

Property 5. For an output port j, if Mi, Xij = 0, then it is 
refered a free output. A free output is free to pick any non- 
empty crosspoint to serve. 

Following Prop. [3][5] the input schedule S^(n) and output 
schedule (n) can be determined after the DISQUO schedule 
X(n) is generated. Therefore, we will next present the basic 
DISQUO schedule updating algorithm that generates X(n). 



At the beginning, set the initial DISQUO schedule X(0) to 
any schedule that satisfies Property [T] For simplicity, we can 
set X(0) = 0. At the beginning of a time slot n, generate an 
input/output permutation H(n) randomly. Then, the DISQUO 
schedule X(n) is updated following the rules below: 

Basic DISQUO Algorithm 



oV(z,j)^H(n): 

(a)X,,(n)=X,,(n-l). 
o For (z, j) G H(n): 

-If (i,j)eX(n-l): 

(b) Xij{n) = 1 with probability pij; 

(c) Xij (n) = with probability Pij = 1 — Pij . 

- Else, if {ij) ^ X(n - 1), and M{k,l) e JV{iJ), 
Xki{n - 1) = 0, then: 

(d) Xij{n) = 1 with probability pij; 

(e) Xij{n) = with probability Pij = 1 — Pij. 

- Else, (ij) ^ X(n - 1) and 3{k, I) e JV{i, j) 
such that Xki{n — 1) = 1: 

(f) X,,(n) = Q. 

Pij is defined as: pij = i+expO^^^^^^ ^^^^^ Wij{n) is a 
weight function of the queue size Qij{n), which is defined as 

W,j{n) = f{Q,j{n)). (14) 

/(•) is a concave function which we will define 
later, Qmax{n) = meiXij Qij{n), and Qij{n) = 
max{f-^{^f{Qmax{n))),Qij{n)}. Recall that for 
any admissible traffic a, there exists an e > such that 
a < (1 — e)t7* component- wise. Thus, e is a small positive 
number that satisfies the condition tr < (1 — e)t7*. 

Note that in our algorithm, Xij{n) can change only when 
(z, j) is in H(n). Therefore, at every time slot, only (i, j) G 
H(n) can join or leave the DISQUO schedule. 

Comparing the algorithm with the updating rules of Glauber 
dynamics, we can see that, DISQUO schedule essentially is 
a multiple-update version of Glauber dynamics, with a vector 
of time-varying weights since the weight is a function of the 
queue length, which changes over time. At every time slot, 
instead of picking only one VOQ, multiple VOQs are picked 
by H(n) to update their states. Like the Glauber dynamic, 
X(n) also only depends on X(n - 1). Thus, the DISQUO 
schedules X(0), X(l), • • • form a Markov chain. As we will 
show later, this Markov chain is irreducible, positive recurrent 
and time-reversible. 

After DISQUO schedule S(n) is generated, inputs and 
outputs can update their schedules S^(n) and S^(n), and start 
the packet transmissions. As we will prove later, following this 
algorithm, the system is stable for any admissible Bernoulli 
i.i.d. traffic. 

C. Discussions 

As we presented in the previous section, the decision of 
making a crosspoint (i, j) active or not is based on a proba- 
bility Pij, which depends on not only the local queue sizes, 
but also a global information Qmaxi'n)- However, since is 
very small, we can use Wij{n) = f{Qij{n)) directly for real 
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implementation. The introduction of Qmax (^) is primarily for 
technical reasons. Therefore, we have the following conjecture. 

Conjecture 1. DISQUO scheduling algorithm with the weight 
function defined as Wij{n) = f{Qij{n)) can still stable the 
system for any admissible traffic. 

To be precise, we still need some message passing between 
linecards. As suggested in |T9| , |[26l, a rough estimate of 
Qmaxi'n) is sufficient to guarantee the convergence of the 
system. Therefore, a low-rate Ethernet connection, which 
is typical in nowadays router design for backplane control 
message passing, can be used for linecards to broadcast their 
local maximum queue sizes so that other linecard can estimate 
the value of Qmaxi'^)- At time slot kN + i, only linecard 
i broadcasts its local maximum queue size. Since at every 
time slot, there is at most one packet departure/arrival from/to 
an input, the estimation of Qmax{n), denoted as Qmaxi'n), 
satisfies: Qmax{n) -2N < Qmax{n) < Qmax{n)^2N. This 
is sufficient for the system stability. For details, please refer 
to Ref. |19|, |26|. 

Recall that we need to generate an input/output permutation 
H(n) randomly at every time slot to update the DISQUO 
schedule. For the simplicity of practical implementation, we 
can generate the schedule following a Hamiltonian walk ||29l, 
instead of using a totally random algorithm. For a switch of 
size N, there are N\ input/output permutations. The Hamilto- 
nian walk schedule H(n) visits each of the A^! distinct permu- 
tations exactly once during every A^! slots in a deterministic 
periodic schedule. This can be simply implemented with a 
time complexity of 0(1) p9| . 

In the following, we will show how this conjecture can be 
implemented in a distributed way, without message passing. 
We will show the performance of the switch by running 
extensive simulations. 

D. Distributed Implementation 

As shown in the basic DISQUO scheduling algorithm, X(n) 
is generated based on X(n — 1). Therefore, each input i needs 
to keep track of the DISQUO schedule in the previous slot, i.e. 
for which output j was Xij{n — 1) = 1. Similarly, each output 
needs to keep track of for which input i was Xij{n — 1) = 1. 
Since the algorithm is distributed, there is no message passing 
between inputs and outputs. DISQUO needs to make sure that 
the inputs and outputs keep a consistent view of the DISQUO 
schedule. For example, if Xij{n) = 1, both input i and output 
j should be aware of this. 

Since the decision for a crosspoint to join or leave the 
DISQUO schedule needs the queue length information, inputs 
are responsible for making the decisions. However, there are 
two problems that have to be solved: 1) before input i decides 
to change Xij from to 1, it needs to check the states of all 
the neighbors of (i, j), namely, the status of output j, which is 
not directly accessible at input i; 2) after input i changes the 
value of Xij , this information has to be passed over to output 
j. To solve the problems, the distributed DISQUO algorithm 
is designed to achieve implicit message passings by utilizing 
the crosspoint buffers. 



The input and output scheduling algorithms work as follows. 
Input Scheduling Algorithm (ISA) 



Output Scheduling Algorithm (OSA) 

For each output port j, assume (z, j) is selected by H(n). 

1) W + i: (a) Xi^j{n) = Xi>j{n - 1). 

2) o If X^^-(n- 1) = 1: 

(b) If at time n, CBij receives a packet from input 
z, Xijin) = 1. 

(c) Otherwise, Xij (n) = 0. 

o Else, if Xij{n — 1) = and there exists a i' such 
that Xi>j{n - 1) = 1, Xi>j{n) = 1. So: 

(d) X,,(n)=0. 

o Else, there is no i' such that Xi'j{n — 1) = 1, output 
j was free: 

- If input i sends a packet to CBij at the beginning 
of time n: 

(e) X,,(n) = l. 

- Else, 

(f) X,j{n)=0. 

3) If Xij{n) = 1, ^^(n) = 1. Output j transmits a 
packet from CBij. Otherwise, output j is free, it 
generates H(n + 1). Suppose that (^^ j) G H(n + 1). 
If CBi'j is non-empty, output j serves it. Otherwise, 
output j picks any non-empty crosspoint to serve. 

For the input scheduling algorithm, before input i decides 
to change Xij from to 1, it has to check the status of output 
j. But this information is not known at input i. If input i 
was free at time n — 1, it has to generate H(n). Suppose 
that (z, j) G H(n). Input i then transmits a packet to CBij 
if CBij was empty at time n — 1. As we show in the output 
scheduling algorithm below, if output j was free, it transmits 
a packet from CBij at time n — 1. Therefore, for case (e) 
and (f), input i can infer whether output j was free or not by 



At each input port i, assume (z, j) G H(n). 

1) V/^j: (a)X,,v(n)=X,,v(n-l). 

2) o If Xij{n-1) = 1: 

(b) Xij{n) = 1 with probability Pij; 

(c) Xij{n) = with probability Pij = 1 — Pij. 

o Else, if Xij{n — 1) = and there exists a / such 
that Xijf{n — 1) = 1, Xijf{n) = 1 according to case 
(a) above. Consequently: 

(d) X,,(n)=0. 

Else, if there is no / such that Xijf{n — 1) = 1, 
then input i was a free input: 

- If CBij is empty, output j was free: 

(e) Xij (n) = 1 with probability pij ; 

(f) Xij{n) = with probability Pij = 1 — Pij. 

- Else, 

(g) X,,(n)=0. 

3) If Xij {n) = 1, Qij{n) > and Bij{n) = 0, S^ij{n) = 
1. Input i sends a packet to CBij. Otherwise, if input 

1 is free, it generates H(n + 1). Suppose that (i,/) 
G H(n + 1). If CBij' is empty, input i serves it. 
Otherwise, it sends a packet to any empty crosspoint 
buffer except CBij. 
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Fig. 2. An example of DISQUO schedule updating. One time slot is divided 
into three phases. H(n) is generated in Phase I to update X(n — 1). In Phase 
II, inputs make their decisions and transmits packets to the crosspoint buffers. 
In Phase III, outputs updates their schedules and transmits packets from 
the crosspoint buffers. Only (i, j) G H(n) can join or leave the DISQUO 
schedule. 



observing the crosspoint buffer CBij. If the CBij is empty, 
output j was free at time n — 1. Otherwise, output j was busy. 

For the output schedulers, an output j has to observe the 
crosspoint buffer CBij to learn input z's decision following 
the output scheduling algorithm (OSA) above. For example, in 
case (b) and (c), it can learn input z's decision by observing 
whether a packet is sent to CBij or not at time n. Thus, 
when (z, j) G H(n), a packet transmission from input i to 
CBij can implicitly pass its decision information to output j. 
If CBij is not empty at the beginning of time n, input i is 
not able to pass its decision information to output j. Recall 
that if output j was free, it can pick any non-empty crosspoint 
buffer to serve. Therefore, if output j was free at time n — 1, it 
can calculate H(n) in advance, and transmit the packet from 
CBij, if (z,j) G H(n) and CBij is not empty. By doing 
that, when (i, j) G H(n), CBij will always be empty at the 
beginning of time n if output j was free, so that input i can 
pass its decision information to output j by sending a packet 
to the buffer. With this rule, when (z, j) G H(n), input i can 
also directly infer that output j was busy if CBij is not empty 
at the beginning of time n, since otherwise, output j would 
have already transmitted that packet from CBij at time n — 1. 

E. Example 

To help understand DISQUO, we give an illustrative exam- 
ple here. We assume that a schedule over one time slot can be 
divided into three phases: a) Phase I: every input and output 
calculate the Hamiltonian walk schedule H(n); b) Phase II: 
inputs update the DISQUO schedule based on H(n), and 
decide the value of S^(n), after which packets can be sent 
from inputs to the crosspoint buffers; c) Phase III: outputs 
update the DISQUO schedule and decide the value of S^(n) 
so that they can transmit packets from the crosspoint buffers. 

As we can see from Fig. [2ja), the DISQUO schedule at 
time n - 1 is S(n - 1) = {(2, 1), (3, 3)}. In Phase I, H(n) = 
{(1, 2), (2, 1), (3, 3)} is generated at each input and output. 
In the following, we use the example to describe how a 
crosspoint joins or leaves the DISQUO schedule, and how 
the input/output scheduler S^(n) and S^(n) are decided after 
X(n) is generated. 

• How a crosspoint joins the DISQUO schedule: (1,2) is 
in H(n) and Xi2{n — 1) = 0. Also, input 1 knows that 



Vj, Xij{n — 1) = so that input 1 was a free input 
in the previous slot. If output 2 was also a free output, 
input 1 can decide whether to let (1, 2) join the DISQUO 
schedule or not, following case (e) or (f) of ISA. Input 
1 cannot know the status of output 2 directly. However, 
it can learn output 2's status by observing CB12. Since 
CB12 is empty, input i learns that output 2 was free. It 
can then decide whether to make (1,2) active based on 
P12. If its decision is to set Xi2{n) to 1, it should send 
a packet to CB 12. Otherwise, it remains a free input. In 
the example, the decision of input 1 is to set Xi2(n) to 

I. Thus, S[2{n) = 1, and it sends a packet to CB12, as 
shown in Fig. [2jb). Note that this transmission implicitly 
passes its decision information to output 2. 

Output 2 was a free output, and it observes that in Phase 

II, input 1 sends a packet to CB 12. Following case (e) of 
OSA, output 2 learns input I's decision of setting Xi2{n) 
to 1. It then updates Xi2{n) to 1, and thus 5'i2(^) = 1- 
Output 2 transmits the packet from CB 12, which is shown 
in Fig.igc). 

• How a crosspoint leaves the DISQUO schedule: (3,3) 
is in H(n) and Xs3{n — 1) = 1. Following case (b) 
and (c), input 3 has to decide whether to keep (3,3) 
in the DISQUO schedule or not, based on a probability 
pss which is a function of the queue size Qss- In the 
example, it decides to set Xssi'n) to 0. Input 3 becomes 
a free input. It calculates H(n + 1), which we assume 
is {(1,3), (2,1), (3, 2)}. Since (3,2) G H(n + 1) and 
CBs2 is empty, it sets 5'32(^) = 1, and sends a packet 
to CBs2 (Fig. [2jb)). Note that by not sending a packet 
to CBs3, input 3 implicitly passes its decision of setting 
-^33 (^) = to output 3. 

Output 3 observes that, in Phase II, input 3 did not send 
any packet to CB^s. Following case (b), it learns input 
3's decision and updates Xss{n) to 0. Output 3 becomes 
a free output. Following the OSA, a free output has to 
generate H(n + 1) at time n. Since (1,3) G H(n + 1) 
and CBis is not empty, output 3 sets 5'^(n) = 1 and 
transmits the packet from CBis, as shown in Fig. [2jc). 
From the example, we can see that, after X(n) is generated, 
which is {(1,2),(2,1)}, a packet is transmitted from input 1 
to output 2, and one from input 2 to output 1. Besides that, 
input 3 and output 3, which are free, also transmit a packet. 
The transmissions by free inputs and outputs can be considered 
as an augmentation of X(n). In the following, we will show 
that the weight, only defined on X(n), is close enough to the 
maximum one to guarantee the throughput. The augmenting 
by free input/output, though it does not contribute to the 
stability of the switch, can improve the delay performance 
of the system. 

E. Stationary Distribution 

As mentioned before, {X(n)} forms a Markov chain. In 
this section, we will derive the stationary distribution of this 
Markov chain, and show that after the system converges, the 
weight of the DISQUO schedule is always very close to the 
MWM schedule in Lemma [9] We can then prove the system 
stability in Theorem |2] 
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Lemma 2. //X(n -I) e X, then X(n) G A'. 

Proof: If X is a DISQUO schedule it satisfies Property 
[T] For an input it is impossible that there exists j ^ j' such 
that Xij{n) = Xijf{n) = 1, since before input i decides to 
change Xij from to 1, it always has to make sure that there 
does not exist a / such that X^j/(n) = 1. 

For an output j, it is also impossible that there exists i ^ 
such that Xij{n) = Xi'j{n) = 1. This is because input i can 
change Xij from to 1 only when output j was free. So, 
Xij (n) = Xi'j (n) = 1 only when input i and input i' decide 
to change the values from to 1 at the same time slot, which 
requires both (i, j) G H(n) and (^^ j) G H(n). But H(n) 
is an input/output permutation such that only one (•, j) is in 
H(n). Therefore, if X(n — 1) satisfies Property 1, X(n) also 
satisfies Property [T] ■ 

As mentioned before, X(n— 1), X(n), • • • is a Markov chain 
since X(n) only depends on X(n — 1). A transition from a 
state X to X' can occur only when the Hamiltonian walk 
schedule H(n) satisfies the condition: 

(XnX^)U(XnXO GH(n). 

This js because VOQs in XflX^ leave the DISQUO schedule 
and X n X' join the DISQUO schedule. According to the 
DISQUO algorithm, only crosspoints in H(n) can join or leave 
the DISQUO schedule. Therefore, both X H X^ and X H X' 
should be in H(n). The following lemma gives the transition 
probabilities. 

Lemma 3. 7/" X can transit to X^ the transition probability 
can be written as: 

Pn{x,x') = n n 

(n,'?;)GXnX'nH (x,2y)GHnXUX'nAr(XUX') 

(15) 

where a{H) is the probability that H is selected (which is ), 
and X A X' = (X n X^) U (X n X^. 

Proof: Please refer to Appendix |A| ■ 

Lemma 4. The Markov chain {X(n)} is irreducible and 
positive recurrent. 

Proof: Please refer to Appendix |B] ■ 
Since the Markov chain is positive recurrent, it has a unique 
stationary distribution. Let us associate each VOQ with a non- 
negative weight Wij{n) = f{Qij{n)) at time n. The Markov 
chain has the following stationary distribution. 

Lemma 5. The Markov chain of the system has the following 
product-form stationary distribution: 



Proof: If a state X can make a transition to X^ we can 
check that the distribution in Eq. ([16]) satisfies the detailed 
balance equation: 



■m-h n |4 n 



where 



^=E n 

XGA" (i,j)GX 



Pij 



E 



n 



(16) 



(17) 



^n(X)pn(X,XO = ^n(XOPn(X^X), 



(18) 



hence the Markov chain is reversible and Eq. ([16]) is the 
stationary distribution (see jSOj, Theorem 1.2). ■ 



G. System Convergence 

For Glauber dynamics, the weights are fixed over time. 
Therefore, the convergence rate of the system can be described 
by the distance between /jL(n) and tt. However, following 
the algorithm presented in the previous section, the weights 
are changing over time such that the Glauber dynamics for 
each time slot n is different from those in other time slots, 
which means tt^ also varies over time. To characterize the 
convergence rate of this system, we can define the distance 



(19) 



We aim to ensure that for any arbitrarily small S > 0, there 
exists a time Tmix{^) that for any n > Tmix{^)^ we have 
dn < 5 that and tt^ are close enough. As compared to 
the definition of mixing time in Definition [s] Tmix{^) shows 
the convergence rate of the system. Therefore, we will refer 
to Tmixi^) as the mixing time of this inhomogeneous Markov 
chain. 

In the following Lemma, we will prove that if the weight 
function f{x), so that Wij{n) = f{Qij{n)) , is carefully 
selected, the system can always converge to the distribution 
TTn following DISQUO scheduling algorithm. 

Lemma 6. If f{x) = ^^^^y^, then there exists a n* that for 
any 5 > 0, W/j.^ — TrnWrv ^ ^ holds for all n > n*, where 
g{x) is a function that satisfies the following conditions: 

• g{x) > 1, for all X > 0. 
. g'{x) > 0, for all x > 0. 

• lim^^oog{f~^{x)) oo. 

Proof: Please refer to Appendix |D] for the detailed proof. 

■ 

One example of g{x) is g{x) = log(e + log(l -\- x)). Note 
that according to Lemma [6] if the weight function is well 
designed, the system will always converge to the product-form 



distribution as expressed in Eq. (16) 



H. System Stability 

As shown above, the Markov chain {X(n)} has a finite 
number of states and we already derived its stationary dis- 
tribution. In the following, we will utilize MWM algorithm 
to prove the system stability. For an input-queued switch, the 
MWM algorithm selects a feasible schedule S(n) with the 
maximum weight: 

S*(n)=argmax V Wain). (20) 

The algorithm can provide 100% throughput for any ad- 
missible traffic in a bufferless crossbar switch. According to 
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Theorem [T] MWM can be extended to a buffered crossbar 
switch. Following the DISQUO algorithm, if Xij{n) = 1, and 
Qij{n) > or Bij{n) > 0, one packet can be transmitted 
from input i to output j. Therefore, we can define the weight 
of a DISQUO schedule as: 



Uniform traffic 



W{X) = ^^X,,{n)W,,{n). 



(21) 



For MWM, the result below has been established in fS]]. 

Lemma 7. For a scheduling algorithm, if given any e and S 
such that < e, J < 1, there exists a B > such that the 
scheduling algorithm satisfies the condition that in any time 
slot n, with a probability greater than 1 — S, the scheduling 
algorithm can choose a feasible schedule S which satisfies the 
following condition: 

W,j{n)>{l-e) Yl ^fe^H' ^22) 

(i,j)eSin) (/c,OGS*(n) 

whenever ||g(n)|| > B, where Q{n) = [Qij{n)]ij and 

1/2 

116(^)11 — j Q'iji'^)) ' Then the scheduling algorithm 
can stabilize the system. 

Theorem 2. DISQUO can stabilize the system if the input 
traffic is admissible. 

Proof: Define the set: 

/C = {X G A' : W{X) < (1 - e)W*(X)}. 

According to Lemma |9] in Appendix [Cj for any 5 > 0, we have 
7r(/C) < (5, if the maximum weight satisfies the condition: 

W*0,) > ^ > 124^. (23) 

eo eo 

So, for any e, J > 0, there exists a 5 > AT^l^ ^^^h that 



whenever ||Q(n)|| > B, 



Y,Ql(n)>B^>N^{ 



log 2 ^ 

eS ^ 



Then, max Qf^(n) > N^{^-^) . Thus, Eq. (23) holds and 
7r(/C) < S. Hence the scheduling algorithm can stabilize the 
system according to Lemma [7] ■ 

IV. Simulations 

In this section, we ran simulations for different scenarios 
to evaluate the performance of DISQUO. We also study the 
delay performance of the scheduling algorithm under different 
traffic patterns, including uniform and non-uniform traffic with 
Bernoulli and bursty arrivals. Note that DISQUO reduces to 
a heuristic scheduling algorithm for all arrival processes that 
are not i.i.d. Bernoulli. For bursty traffic, the burst length 
is distributed over [1,1000], following the truncated Pareto 
distribution: 



P{1) 



, / = 1, 2, ... , 1000, 



(24) 



where / is the burst length, a is the Pareto distribution 
parameter and c is the normalization constant. In the sim- 
ulations, a = 1.7, for which the average burst length is 



10000 
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Fig. 3. Switch size N=32, uniform traffic for both Bernoulli i.i.d. and bursty 
arrivals 



Lin-diagonal traffic 



->*— Bursty - DISQUO 

■ e - Bursty - 00 

—I — Bernoulli - DISQUO 

■ H - Bernoulli - 00 




Fig. 4. Switch size N=32, lin-diagonal traffic for both Bernoulli i.i.d. and 
bursty arrivals 



about 11.6. All inputs are equally loaded and we measure the 
packet delay. Simulations are run for long enough to ensure 
that the confidence intervals are small enough to make valid 
comparisons. 

A. Uniform Traffic 

For uniform traffic, a new cell is destined with equal 
probability to all output ports. Let a represent the traffic load, 
and the arrival rate between input i and output j is aij = . 
The delay performance of DISQUO under uniform Bernoulli 
and bursty traffic is shown in Fig. [3] We can see that the packet 
delay of DISQUO is very close to the output-queued switch 
(OQ). It has been shown that under uniform traffic, even an 
algorithm as simple as RR-RR can have a delay performance 
close to an output-queued switch (TT\. However, the RR-RR 
algorithm cannot achieve 100% throughput when the traffic 
is non-uniform. Therefore, we will study the performance of 
DISQUO under non-uniform traffic next. 

B. Non-uniform Traffic 

We ran the simulations for the following non-uniform traffic 
patterns: 

• Lin-diagonal: Arrival rates at the same input differ 
linearly, i.e, cr^(i+j (mod at)) - cr^(i+j+i (mod at)) = 
2a/N{N^l). 
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Hot-spot Traffic 

100 1 < < < — 
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Fig. 6. Results of switches with different sizes, with hot-spot Bernoulli i.i.d. 
traffic, where lj = 0.5 



• Hot-spot: For input port i, an = ua and aij = (1 — 
uj)a/{N — 1), for i ^ j. We can get different traffic 
patterns by varying the hot-spot factor u. 

The delay performance for Hn-diagonal and hot-spot traffic 
are shown in Fig. |4] and Fig. [5] respectively. We can see that 
under Bernoulli traffic, the delay performance of DISQUO is 
still very close to the output-queued switch. Packets have low 
delay even when the load is as high as 0.99. Note that the RR- 
RR algorithm can have a throughput of approximately only 
85% 112^ under hotspot traffic. Note that DISQUO is stable 
for the bursty traffic scenarios that we simulated. 

C. Impact of Switch Size 

We also study the impact of switch size on the delay per- 
formance. Generally, for input-queued switches, the average 
delay increases linearly with the switch size |7|. For output- 
queued switches, delay is independent of the size. Fig.[6]shows 
the delay performance of DISQUO with different switch sizes 
under Bernoulli hot-spot traffic, for which uj is 0.5. We can 
see that the delay is almost the same for different switch sizes. 
As the size increases, the delays even decrease slightly. This 
is partly because as the switch size increases, the number of 
crosspoint buffers increases as well, and the crosspoint buffers 
play a key role in reducing the average delay. 



D. Impact of Buffer Size 

If the buffer at each crosspoint increases to infinity, the 
buffered crossbar switch is then equivalent to an output-queued 
switch. So if we increase the buffer size, the average delay will 
decrease and converge to the delay of an output-queued switch. 
As we already showed in previous simulation results, the delay 
performance of DISQUO with a buffer size of 1 is already 
very close to that of an output-queued switch. Therefore, by 
increasing the buffer size, we can only get a very marginal 
improvement in delay performance. DISQUO can be easily 
modified for values of K > 1. Due to space considerations, 
we will not define DISQUO with K > 1 here. Fig. [7] shows 
the delay performance of DISQUO with different buffer sizes, 
under hot-spot traffic. We can see that the improvement is 
small. Therefore, we only need to implement a one-cell buffer 
at each crosspoint and still provide good delay performance. 
This is crucial since current technology limits the size of 
crosspoint buffers to a small number. 

V. Conclusion 

In this paper, we first proposed a distributed scheduling 
algorithm (DISQUO) for crosspoint buffered switches with a 
crosspoint buffer size of as small as one and no speedup. The 
computational complexity of DISQUO is only 0(1) per port, 
and we proved that it can achieve 100% throughput for any ad- 
missible Bernoulli i.i.d. traffic. We evaluated the performance 
of DISQUO by running extensive simulations. The results 
show that DISQUO can provide very good delay performance, 
as compared to an output-queued switch. With DISQUO, the 
average queuing delay for a packet is independent of the switch 
size, which makes it very suitable for large-scale switching 
system design. 
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Appendix 

A. Proof of Lemma^ 

Proof: The transition occurs only when the VOQs in H 
satisfy the conditions below: 

1) For any (z, j) G X flX^: the VOQ is selected by H and 
decides to change its scheduling decision from 1 to 0, 
which happens with probability p^j . 
For any (A;, /) G XnX': the VOQ is selected by H and 
decides to change its scheduling decision from to 1, 
which happens with probability pki- 
For any {u, G X fl X' fl H: the VOQ was in the 
DISQUO schedule of the previous time slot, and even 
though selected by H it decides to keep its state, which 
occurs with probability Puv • 



2) 



3) 



4) For any {x,y) G H fl X U X' fl A/'(X): neither the 
VOQ nor any of its neighbors was in the DISQUO 
schedule of the previous time slot, and though selected 
by H it decides to keep its schedule, which occurs 
with probability p^y. Since H is a DISQUO schedule 
and X n X ^ G H, H fl A/'(X n XQ = 0. T hus 

H n X u x^nA/'(x) ^ Hn x u x^nA/'(xuxo. We 

replace HnX U X'nA/'(X) by HflX U X'nA/'(X U X^ 



in Eq. ( 15 ) for the proof of the stationary distribution in 
the following. 

Since H is a permutation of the inputs and outputs, for any 
two VOQs in H, they are not neighbors of each other. There- 
fore, they can make the scheduling decisions independently. 
We can then multiply the probabilities of all the four categories 
above, which leads to the transition probability given by Eq. 
([T5J. ■ 

B. Proof of Lemma ^ 

Proof: Suppose that X is a DISQUO schedule, and it has 
k non-zero elements: (ii, ji), (^2,^2) • • • {h^jk)^ X. Let X/ 
represent a DISQUO schedule which has / non-zero elements: 
(hJi), fe, j2) • • • {iiJi) G X^ C X, < / < fc. We can see 
that Xo = and X^ = X. Since X is a DISQUO schedule, 
Xi is also a DISQUO schedule and X^_i U X^ = X^ G A'. 
Therefore, the system can make a transition from X/_i to X/ 
with positive probability when (ii^ji) G H(n), as we already 
proved in Lemma [3] Hence, state Xq can reach any state X 
G A* with positive probability in a finite number of steps and 
vice versa. Thus, the IMarkov chain is irreducible and positive 
recurrent. ■ 

C. Lemmas for System Stability 

Lemma 8. Suppose that T(-) is a function defined on a set X. 
For any probability distribution \i on X, define the function: 



F(M,T(X)) = ^^[r(X)]+H(M), 



(25) 
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where l-L{iJi) is the entropy function: — J^xgA:' 

Then F{-) is uniquely maximized by the distribution: 



M*(X) = -exp(T(X)), 



(26) 



where Z = Exe^r exp(T(X)). 



The last step in Eq. ( [3T] > uses Eq. ([29]l. So, 

W*{X)+%{-k') < W*(X)(l-e7r(/C))+-H(7r) 
e7r(/C)Vr*(X) < Hin) - H{7t') < %{t:) < log \X\ 



7r(/C) < 



iog|>y| 

eVF*(X) 



(32) 



Proof: For any probability distribution /x, we have: 

F(^,T(X)) 

= ^^[r(x)] + H(M) 

= ^ M(x)r(x) - ^ m(x) logM(x) 

= Mx)(iogM*(x) + iogZ)- ^ MX) log MX) 

M(X) 



= ^ MX)iogZ+ ^ MX)iog^ 



XG;f 



M(X) 

< logZ 5] M(X) +log ( ^ M(X)^) 

= logZ, (27) 
with equality holding only when /i = /i*. QED ■ 

Note that when T(X) = 0, the uniform distribution maxi- 
mizes F(/i,0), and we have: 



F(/i,0)=H(/i)<logZ = log|A'| 
where I A' I is the size of A'. 



(28) 



Lemma 9. Let W{-) be the weight function and VF*(X) the 
maximum weight. Define the set: 

/C = {Xg A-: W"(X) < (l-e)H^*(X)}. (29) 

Then, we have: 

""^^^ - ^^^^ 



Proof: As shown in Eq. ([16]), for a schedule X G A', 
its stationary distribution is: 7r(X) = ^ 11(2 j)GX = 
^g^(x) According to Lemma [s] tt maximizes W{X.)). 

Let X* be the schedule that maximizes the weight, and tt' 
be the distribution that assigns all probability on X* such that: 



7r'(X) 



{I 



if X = X* 
otherwise 



Then we have: 



F{7r\W{X)) = E^.[W{X)]^n{7r') 
= W*(X)+H(7rO 

< F{7r,W{X)) = E^[W{X)]^n{7r) 

< iy*(X)(l-7r(/C)) 
+iy*(X)(l-e)^(/C)+HW 

= I^*(X)(l-e^(/C))+HW (31) 



D. Proof of System Convergence 

Before presenting the proof, we need to introduce some 
preliminaries. We will first define a matrix norm, which will be 
useful in determining the mixing time of a finite- state Markov 
chain. 

Definition 7. (Matrix norm) Consider a \fl\x\fl \ non-negative 
matrix A G and a given vector G r[^'. Then, the 

matrix norm of A with respect to is defined as: 

l|Al/||2,^ 



sup 



M2 



(33) 



where v G m[^' and E^^\y\ = /li^i. 

It is easy to check that the matrix norm has the following 
properties [261 '. 

Property 6. For a matrix A G M^^'^l^'^ tt g m[^' and a G 



laAII 



lalllAII 



(34) 



Property 7. A and B are the transition matrices of two 
reversible Markov chains. They have the same stationary 
distribution which is tt. We then have: 



|AB|L< IIAII 



(35) 



Property 8. Let P be the transition matrix of a reversible 
Markov chain, which has the stationary distribution tt. We 
then have: 



(36) 



where e^ax = max{|e| : |e| ^ l^e is an eigenvalue ofF} 
and < Cmax < 1- 

With the definition and these properties, it follows that for 
any distribution /l^ on fi, , we have p6|: 



(37) 



^-1 


< 


|P1U 


^-1 




TT 


2,7r 




TT 


2,7r 



Then, if the Markov chain is time-reversible, we have: 



< iiPii: 



2,n 



< eL, 



2,n 



m(o) 



2,7Z 

(38) 



Since 



Ai(0) 



2,7r 



7r{i) 



< 



1 



min^ 7r{i) 



(39) 
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for any S > 0, we have 



Bill 



<S if 



T > 



I log l/TTmin + log 1/(5 



(40) 



log l/Cmax 

where TTmin = min^Tri. The equation above suggests that 
the mixing time of a reversible Markov chain with transition 
matrix P scales with 1 — Cmax^ where Cmax = max{|e| / 1 : 
e is an eigenvalue of P}. Therefore, in the following, we will 
refer to the mixing time of a reversible Markov chain with 
transition matrix P as: Tmix = — • 

Recall that following the updating rules of DISQUO algo- 
rithm, there are at most TV updates at every time slot, where TV 
is the number of ports. Therefore, we will consider a multiple- 
update Glauber dynamics defined as follows. 

Definition 8. (Multiple -up date Glauber dynamics) Consider 
a graph G(V,E), with W = [W^J^^v. which is a vector of 
weights associated with the vertices. Multiple update Glauber 
dynamics (MUGD) is a Markov chain over T{G). Suppose 
that the chain is at state X(n — 1) = [Xi{n — l)]iev time 
n — 1. The next transition of multiple -up date Glauber dynamics 
follows the rules: 

• Randomly pick a set H(n) G I{G) at random. 

• For i G H(n).* 

- //Vj gA/'(z), Xj{n - 1) = 0, then 



X^{n) 



1 with probability i^^^p^.) 
otherwise. 



- Otherwise, Xi{n) = 0. 
• Xi{n) = Xi{n — 1), for all i ^ H(n). 

The transition matrix is similar to Eq. ( p3] ) but with a vector 
of fixed weights. The multiple-update Glauber dynamics is 
also a positive recurrent, time-reversible Markov chain. It is 
easy to verify that the product-form stationary distribution in 
Eq. ([T]) satisfies the detailed balance equation in Eq. ([18]) that 
it is also the stationary distribution of the multiple-update 
Glauber dynamics. In the following lemma, we will give 
an upper bound on the mixing time of the multiple-update 
Glauber dynamics. 

Lemma 10. (Mixing time of multiple-update Glauber dynam- 
ics) Let P be the transition matrix of the multiple -update 
Glauber dynamics on a graph G = (V, E), for which there 
are N vertices with weights W = [ly^J^^v have: 

where Wmax = max^ev Wi. 

Proof: For a nonempty set A CX{G), we have: 

Let us define the following: 

F(A) = ^(^K- 



The conductance of the transition matrix P is defined as: 

F(A) 



mm 



AcX(G):7r(A)<| 7r(A) 



There is a well-known conductance bound | |32| , | |33| with the 
form: 

(/)2(P) 

c-m.ax _^ J- ^ • 



Now, we have 

0(p) 



mm 



F{A) 



AcX(G):7r(A)<^ 7r(A) 

ExeA,x-eAc^(X)P(X,XO 

= mm — 

AcX(G):7r(A)<^ 7r(A) 

> 2 min P(A,A^) 

AGX(G) 

> 2 min 7r(X)P(X,X0 

p(x,x07^o 

> 2min7r(X) min P(X, XM 

X X7^X',P(X,X07^0 

For the Glauber dynamics, the stationary distribution can be 
lower bounded by: 

7r(X) > ^ 



> 



> 



Exei(G) exp(Eiex Wi) 
1 

\X{G)\exp{NWma.) 
1 



2Nexp{NWmax) 



Also, we have: 



So, 



0(P) > 

> 



exp{Wmax) 

2 



N 



2^^{1 + eMWmax))^ eMNWmax) 

2 



Thus, 



2^^ eM'^NWmax) 



< 1 



emax < 1 26iv exp(4iVI^_,) - ^ 26^ exp(47Viy^,,) ' 
Since Tmix = jii^ — , we have: 

Tmix 

<2^^exp(4W^a,). 

■ 

Now, we are ready to prove Lemma [6] We will first identify 
the condition for the system to converge in Lemma [TT] Then, 
in Lemma 12 we will prove that if the weight function /(•) 
are well designed, the condition for the system convergence 
can be satisfied, and thus finish the proof of Lemma [6] The 



proof of Lemma 11 is mainly adapted from Ref. p9| , p6| . 

Let Pn denote the transition matrix at time n. Cmaxi"^) = 
max{|e| : |e| ^ 1, e is the eigenvalue of P^}, and = 
^~(^' which is the mixing time of the multiple-update 



l-Crr 



Glauber dynamics with weight vector W(n) . 

In the following Lemma, we will prove that given the 
condition that anT^+i < S/S (VJ > 0), the system can 
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converge within finite time, where an is defined as Eq. (42). From Eq. ( [44] ) and Eq. ( [45) ), we have: 
We will also give an upper bound on the mixing time of the 

system. ^^±J^ = f^exp( ^ (|^,^.(n + l) - vr,^.(n))) 

< exp(^/(Q,,(n)) + /(Q.,(n + l))), (47) 

Lemma 11. a^^n+i < ^/i^^ /6>r (5 > 0, 

IIMu ~ '^nllry ^ ^ /z6>/J^ for all n > n*, where T^+i ^j^^ ^j^q 

the mixing time of the multiple -up date Glauber dynamics with . . 

weight vector W(n + 1), ^^^i, < exp( V /'(Qy (n)) + /'(^.(n + 1))). (48) 

an = Y, f'{Qij{nj) + f'{Qij{n + 1)), (42) 

Define a„ = [/'(Qii(n)) + f'{Qij{n + 1))] . We have: 

^1 ,2, iV2/ _N exp(-a„)< < exp(a„) (49) 

n* = niin^ ^ > log(-) + — (log2 + W^™„,(0)J., 7r„+i(X) 

' (43) Recall that a„r„+i < |, and r„+i = ;^_g^^^^(„_^^) > 1 is 

the mixing time of the multiple-update Glauber dynamics with 
weight vector W(n). Since 5 is any small positive number, 
we have < q;„ < 1. Since 1 — x < and < 1 + 2x 
Proof: The stationary distributions for the multiple-update f^j- ^jj ^ e [0, 1] we have 
Glauber dynamics with weight vectors W(n) and W(n + 1) 

can be written as: < ^"^-^ _ i < 2a„. 

1 „ 7r„+i(X) 

7r„(X) = — exp( 2^ Wij{n)), 

and ^ V7r„+i(X) / 

TTn+iCX.) = exp( ^ Wij{n+1)), Then, 

^'"'^''^ l|.,r ^ l|2 - II 1||2 

respectively. So, ll^n+i " ^nll2,i/.„,. - II — " l|l2,.„,. 

(44) < 4a2^7r„+i(X)=4a2 

and X 

^ Exei(G) exp(E(»,j)€X Thus, 
^"+1 " Ex6i(G)exp(E(i,,-)6xW^ii(«+l)) ||7r„+i-7r„||2,i/x„+, <2a„. 

< maxexp( ^ VFij(n) - VFy (n + 1)). The distance between |i„ and 7r„ can then be bounded by: 

Note that Wij(n) = f{Qij{n)), and /(•) is a increasing 



concave function that f{b) — f{a) < f'{a)(h — a). Therefore, 

W,j{n)-W,j{n^l) < /(Q.,(n + l))(Q,,(n)-Q,,(n + l)) 



< Wll^ - 7rn-l||2,l/7r^ + 2an-l. (50) 



Note that 

The equation above is according to the fact that at every time 2 

slot, there is at most one arrival and one departure that we ll^n ~ '^n-i ||2,i/7r^ — ^ tt (^\^^'^^^ ~ ^n-ii^))'^ 
have -1 < Qij{n) - Qij{n + 1) < 1. So, x ^ 

v^ 7rn-i(X) 1 

< maxexp( V /(g.,(n + l))) ^ ^n(X) ^n-i(X) 

^ ^^•'^"'^ •(A.„(X)-7r„_i(X))2 

< exp(^/'(Q.,(n+l))) (45) < e(— ) ||m„ - 7r„_i ||^,,/,_(51) 

Similarly, we have 

^<exp(^/'(Q,,(n))) (46) 
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From Eq. (50) and (5_\_), we have: 

II /^n 



So, for n < n*, we have 



l||2,7r^ < exp(an-l/2)||/[X^ - 7rn-l||2,l/7r^_i 

+2an-i 

< (1 + - 7rn-l||2,l/7r^_i 

+2an-i (52) 



Let us define 



f^n ll^n+l - '7rn||2,l/7r^- 
V8. If Pn < 



(53) 



Note that an < an^n+i < S/8. If /3n < S/2, then from Eq. 
52l), we have: 

II Mr. 



(54) 



for all n > n^. Therefore, to establish the result, we then have 
to prove that Pn < ^^/2 holds for any n > n* . Consider the 
following equation: 

Pn+l llMn+2 ~ '7rn+l||2,l/7r^+i 

= ll^-i|k.„,. 

II Mn+jPn+l .|| 
= II -L||2,7r^+i 

< ||Pn+l||7r^+i||Mn+l " 1| 2,l/7r^+i 

< emax{n + l)||Mn+l " ^Tn+l || 2,l/7r^+i 

< (1-;^) 

•(llMn+1 -^n||2,^^ + W^n -7rn+l||2,^^ 

< (1 - (llMn+1 - 7rn||2,l/7r^+i + ^a^) 



T, 



n+l 



•(exp(a„/2)||/x„+i - -n-„||2,i/^„ + 2a„) 
= (1 - ;^)(exp(a„/2);9„ + 2a„) 
< (1- ;^)f(l + a„)/3„ + 2a„) (55) 



Suppose that /?„ < S/2, Eq. ( [55] l can be written as: 

1 .^S <5, 



/?n+l < (1 



Tn+1 
1 



< (1 ^ )f^+(2+^) ^ ) 

6 1 /S S 

(2+(2+2fc-(2+2)8) 



< 



2 ^n+l 



(56) 



From Eq. ([56]), we can see that, if /3n* < ^^/2, then for any 
n > n*, /3n < V2. 



In the following, we will find n* which is the smallest 
number to satisfy Pn < S/2. Note that if Pn* < S/2, then for 
any n > n*, /3n < (^/2. Therefore, for any n < n*, /3n > (^/2. 



Pn < {I- ^) ((1 + an-l)Pn-l + 2an_i' 

< (l-^)((l+C.n-l)/3n-l+4an-l^) 

< (l-^)(l + (l + ^)«n-l)/3n-l 

< (l-;^)(l + ;^)/^n-i 



= (1-— )/3„_i 

-'^ n 

^ exp(--2)/3n-l 
^ n 
n ^ 

< exp(-^-2)/3o, 



(57) 



where Pq can be written as: 



Po = 



I Ml 



II 



2,7ro 



< 



IIMqPo - '7ro||2,l/7ro 
emax{0)\\^0 - 7ro||2,l/7ro 
I 1 



(58) 



Z(0) ^ 2^'' e^piN'^WmaxiO))' 



^min (0) 

where 7r^in(0) = mini7ro(i) > ^ > ^ 
So, 

/3o < (2exp{Wmaxm) . (59) 
/^n* < ^/2 that it satisfies the condition: 

(2exp(I^^,,(0))j exp(- ^ ^) < V2, (60) 



or. 



E ^ ^ l^g(^) + ^ ( log 2 + WmaAO)) . (61) 

Note that Ti is bounded such that there always exists an* 
which can satisfies the condition above. ■ 



Lemma 12. If f{x) = ^^^^y^, there exists a constant C that 
when HQ II > C, for any ^ > 0, a^^n+i < ^/8, where g{x) 
is a function that satisfies the following conditions: 



• g{x) > 1, for all X > 0. 
. g\x) > 0, for all x > 0. 



Proof: We have: 
1 



{l^x)g{x) 



\og{l^x)g\x) ^ 1 



^2(x) 



1 + x 



Also, 



/-^(x)=exp(x^(/-Hx)))-l. 



(62) 
(63) 
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Recall that 

< iV'(/'(Qmm(n) + /'(Q™i„(n+l)), (64) 
where Qmin{n) = miiijj Qij{n), and 

T„+i < 26^' exp(4iV2vr™a,(n + 1)) (65) 

So, 

•Af'(/'(g™„(n) + /'(Q™m(n + 1)) 

< iV226^' exp(4iV2W™,,(n + 1)) 

^ 1 _^ 1 ^ 

(n) 1 + Qmi„(n+1) 

< 2iV226iv^ exp(4Af2^™^^(n + 1)) 

) 

2/9 AT^ \ 

< 2iV226^ exp (4iV2(— )l^^,„(n + 1)J 



f-\Wmin{n+l)) 
= 2N^2'^^" exp (sN^/eW^inin + 1)) 

1 

exp (w„,Un+l)g{f-HW^in{n+m) - 

Then, 

anT„+i < exp[(8iVVe 

If W^ax ^ 00, Wmin ^ OO SUCh that ^(/"H^mm(n + 

1))) ^ 00, and thus the value of 8N^/e - g{f~^{Wmin{n + 
1))) 00. Therefore, for any ^ > 0, there exists a constant 
C such that when ||Q|| > C, a^Tn+i < 6/8 holds. 

By proving Lemma [12] we give the sufficient condition that 
the system can converge, as stated in Lemma 11 Therefore, 
following the randomized scheduling algorithm, the inho- 
mogeneous Markov chain can still converge to a stationary 
distribution, which can be expressed as Eq. ([16]). We finish 
the proof of Lemma [6] ■ 



